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ABSTRACT 

To study the relationship between inferences made on 
standardized reading tests and item difficulty, 50 items on the 
reading comprehension section of the Metropolitan Achievement Test 
were analyzed independently in this study by two raters using four 
general categories of inferences: (1) reference inferences, (2) 
between proposition inferences, (3) source inferences and (4) 
metalinguistic inferences. The items were in standard reading 
comprehension test format (reading passages followed by multiple 
choice items based on the passage). An inference was operationally 
defined as the mental process of inducing or deducing, as cued by the 
test item, information not explicitly stated in the test. No 
significant relationship was found between inference types and item 
difficulty; however, a significant relationship was found between a 
general measure of raw amount of information processed and item 
difficulty. Although most reading skill hierarchies assume that 
inferential cognitive operations are inherently more difficult than 
non-inrerential operations, these findings suggest that either 
inferential cognitive operations are not inherently more difficult 
than non-inferential cognitive operations, or that the inferential 
cognitive operations have been internalized at the level of 
automaticity for school-aged test takers. (Twenty references are 
attached and two tables of data are included.) (NH) 
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Abstract 

Most reading skil?. hierarchies assume that inferential 
cognitive operations are inherently more difficult than non- 
inferential operations, yet there has been little empirical 
study of the relationship between inference types and item 
difficulty on standardized reading test items • In this 
study 50 items from the reading comprehension section of the 
Metropolitan Achievement Test were analyzed for four general 
categories of inference. Item difficulty (p values) was 
than regressed on inference types. No significant 
relationship was found between inference types and item 
difficulty, however, a significant relationship was found 
between a general measure of raw amount of information 
processed and item difficulty. These findings suggest that 
either inferential cognitive operations are not inherently 
more difficult than non-inferential cognitive operations or 
that the inferential cognitive operations in this study have 
been internalized at the level of automaticity for school 
aged test takers. 
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Although there has been a considerable amount of study 
of the different types of inferences made while reading 
different forms of written discourse ( Crothers, 1979; 
Kintsch, 1979; Warren, Nickolas and Trabasso, 1979) there 
has been little research done on the types of inferences 
made on standardized reading tests and their relativ^nship to 
item difficulty. 

In a study of surface level linguistic features and 
their relationship to item difficulty on standardized 
reading tests. Drum, Calfee and Cook (1980) found that such 
surface structure elements as word length, prepositional 
density and syntactic density were significantly related to 
item difficulty and accounted for as much as three fourths 
of the variance. Commonly surface level linguistic measures 
are associated with non-inferential cognitive operations 
used to create a micro-structure representation of 
information explicitly stated (Kintsch, 1979) . Under this 
interpretation of surface level linguistic features, the 
Drum, Calfee and Cook findings would seem to imply that 
inferential cognitive operations are not strongly related to 
item difficulty in reading test items. This conclusion is 
supported by DiStefano and Valencia (in press) who found no 
significant relationship between the type of reading 
question (literal versus inferential) and item difficulty on 
items administered by the National Assessment of Educational 
Progress. However, DiStefano and Valencia did not study 
different types of inferences and included very few 
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inference questions among their sample of items* Hence, the 
small sample of inferential items and the collapsing of all 
inference types into one category could have masked a 
relationship between certain inference types and item 
difficulty* 

The apparent lack of relationship between inferential 
cognitive operations and item difficulty is not consistent 
with most hierarchies of cognition, especially those that 
deal with the processing of linguistic information 
(Rosenshine, 1980) • That is, since most models of the 
reading process either implicitly or explicitly assert tha,t 
the process of answering inferential questions is more 
difficult than the process of answering literal questions 
one would asstime that the number and type of inferences 
required on a reading test item would be a stronger 
predictor of an item's difficulty than the number and type 
of non-- inferential cognitive operations* This was the basic 
finding of Hillocks and Ludlow (1984) in their study of 
student responses to questions (both literal and 
inferential) from relatively long narrative and expository 
passages* In fact. Hillocks and Ludlow found that different 
inference types could be arranged in a hierarchic fashion 
relative to difficulty. 

Given the discrepant findings of the pillocks and 
Ludlow versus the Drum, Calfee and Cook and the DiStefano 
and Valencia studies, a question as yet unanswered is "to 
what extent are specific inference types related to itom 



difficulty on standardized reading teste?" The purpose of 
this study was to an<3wer that question. More specifically, 
this study sought to answer the research question: "What is 
the relationship between the type of inferences in 
standardized test items and item difficulty?" 

METHOD 

Fifty items from the reading section of the 
Metropolitan Achievement Tests, Intermediate Level, Form JS 
(Prescott, Balow, Kogan and Farr, 1978) were analyzed for 
four general categories of inference each with 
subcategories- The items were in standard reading 
comprehension test foraat (reading passages followed by 
multiple choice items based on the passage) • 

An inference was operationally defined as che mental 
process of inducing or deducing, as cued by the test item, 
information not explicitly stated in the text* The four 
general categories of inferences studied were: 1) reference 
inferences, 2) between proposition inferences, 3) source 
inferences and 4) metalinguistic inferences. 

Inference Categories 

Reference inferences are those in which a reader must 
infer that a word, phrase, or a syntactic cue in an item 
refers to a specific word, proposition or set of 



propositions in the reading passage accompany inrf the item* 
For example ^ assume a reading test item were written in the 
following way: 

The young girl in the story was late for: 

a) lunch 

b) school 

c) a tea party 

d) baseball practice: 

If the passage to which the item applied did not use the 
term "young girl" but referred to her by name (e.g., Jana) 
and forms of the third person pronoun (e.g., she, her ), the 
reader would have to infer that "young girl" referred to 
Jana. 

There are a number of models for and ways of describing 
the different types of reference inferences that can be made 
(Halliday and Hasan, 1976; Meyer, 1975; Turner and Greene, 
1977) . In this study four subcategories of reference 
inferences were analyzed: 

1) refer^^.nce by syntax (The syntactic structure 
of an item signals information in the text.) 

2) reference by synonym (A synonym is used in the item 
for a term in the text . ) > . 

3) reference by general term (A superordinate term 
is used in the item for a subordinate term in 
the text.) 



4) reference by e^pecific term (A subordinate term 
is. used in the item for a superordinate term in 
the text, ) 

Pronomial reference, perhaps the most common form of 
reference in oral and written discourse, was not included in 
the analysis because virtually every occurrence of a pronoun 
used in a te.st item was accompanied by a pronoun in the 
text a Consequently the reader was not required to make an 
inference in the item to get back to the text because the 
relationship between the pronoun and its antecedent would 
have already been established as a result of reading the 
text. 

Between proposition inferences occur when the reader 
must infer a relationship between propositions which is not 
explicitly stated in the text. For example, assxame that an 
item- were written in the following way: 

The young girl was late for baseball practice 
because : 

a) she had to finish her paper route 

b) she had to "tay after school 

c) she was not feeling well 

d) she didn't feel like practicing .that day. 

Here the reader must make a connection between the 
proposition "the young girl was late for school" and one of 
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the four propositions listed in the alternatives. The 
connection the reader must make is one of causality. 
Presumably, in the text to which this item refers, one of 
the four alternative propositions was stated as a cause for 
the proposition in the stem. Meyer (1975) refers to such 
relationships between propositions as rhetorical predicates. 
Halliday and Hasan (1976) refer to them as conjunctives. If 
there were no explicit linguistic signal in the text (e.g*, 
use of the conjunction because between the two propositions) 
the reader would have to infer that the proposition in the 
stem and the correct alternative did, in fact, have a causal 
relationship. 

Again there are a number of ways to describe the 
different types of relationships and, consequently, 
inferences that can be made between propositions. Marzano, 
Hagerty, Valencia and DiStefano (1987) have determined that 
between proposition relationships coiomonly described in most 
propositionally based systems of language analysis can be 
classified into four major categories: causal relationships, 
additive relationships, comparative relationships and 
temporal relationships. These can exist between 
propositions explicitly stated in the text ana between those 
not stated in the test. In other words, two propositions 
may have a relationship and both are explipitly. stated in 
text or only one is stated in the text. These two 
characteristics (type of relationship between propositions 
and implicit or explicit presence in the text) were 



collapsed in this study to create two general categories of 
between proposition inferences: 

- inferences requiring the reader to identify a 
causal^ additive, temporal or comparative 
relationship between two propositions explicitly 
stated in the text which do not have an 
explicit linguistic marker signaling the 
relationship. 

- inferences requiring the reader to identify a 
causal, additive, temporal or comparative 
relationship between two proposition one of 
which is not explicitly stated in the text. 

Source inferences are those in which uhe reader must 
infer some characteristic about the author or the intention 
of the author from reading the text. For example, an item 
which referred to information the author must have known or 
could not have know would require an inference about 
"source." Inferences about source include aspects of 
"theme" as des'cribed by Halliday (1967) and "staging" as 
described by Grimes (1972) . 

Metalinguistic inferences are those w)iich require the 
reader to know some specific characteristics and conventions 
of written discourse. For example, an item which assimes 
the reader knows that a story will generally include a 
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setting, an initiating evert, a climax and a coiiclusion 
requires metalinguistic inferences, De Beaugrande (1980) 
has identified eight types of metalinguistic structures 
common to written discourse • These are: descriptive, 
argumentative, literary, poetic^ scientific, didactic and 
conversational. Van Dijk (1980) has identified four types 
of metalinguistic structures all of which are covered by de 
Beaugrande •s categories. Within the present study, 
inferences about any of the above structures were coded as 
metalinguistic . 

Analysis of Items 

The fifty items on the reading comprehension section of 
the Metropolitan Achievement Tests were analyzed 
independently by two raters using the inference categories 
described above. Each item was scored in a dichotomous 
fashion (presence of inference type versus lack of presence 
of inference type) for each of the four general categories 
of inference. Inter-rater reliabilities on the initial 
analysis ranged from .82 for source inferences to .96 for 
reference inferences as measured by Pearson product moment 
correlations. Although these results indicated substantial 
agreement, all disagreements were submitted to a third 
rater. The third rater's agreeaient with one of the primary 
raters was accepted as the correct coding. 
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Analysis of Data 

Item difficulty (item p values) was regressed on the 
four general inference types using a stepwis-* multiple 
regression analysis* Table 1 reports the means and standard 
deviations for each of the five variables in the equation. 



Table 1 here 

Table 2 contains the regression coefficients and F values 
for the variables in the equation. 

Table 2 h«re 



RESULTS 

Table 1 indicates that the fifty items contained many 
inferences of all types. Reference inferences were the most 
frequent, occurring in about nine out of ten items. Source 
inferences were the least frequent. As Table 2 indicates 
none of the inference types were significant (.05 level) 
predictors of item difficulty within the multiple regression 
equation. The multiple R for the equation was .33 and :*^d a 
probability of .55. 

The lack of significance of the multiple R was 
interpreted as an indication that the overall amount of 
inferences made on a reading test item is not a significant 
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TABLE 1 



Means and standard deviations for variables in the equation. 
Variable N M_ 



SD 



Item difficulty 
(p value) 

Reference 

Between 
Proposition 

Source 

Metalinguistic 



50 
50 

50 
50 
50 



57.20 
.90 

.43 
.07 

.10 



15.61 
.61 

.63 
.25 
.40 
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TABLE 2 



Regression coefficients, F values and significance levels. 



Variable 



B 



Beta 



Reference 
Source 

Metalinguistic 

Between 
proposition 

Constant 



.962 
-.636 
-11.189 

-5.027 
59.6/3 



.038 
-.010 
-.289 

-.202 



.034 
.003 
2.17 

.92 

74.602 



.855 
.958 
.153 

.346 
.000 
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predictor of item difficulty. The lack of significance of 
any predictor variable in the equation was interpreted as an 
indication that no single type of inference included in this 
study is significantly related to item difficulty. In other 
words, the findings of this study imply that the number and 
type of inferences made on reading comprehension test items 
have little relationship to the difficulty of items. 

To test whether the difficulty of the items was a 
function of non-inferential rather than inferential 
cognitive operations- two other predictor variables were 
entered into the equation: 1) passage length, and 2) depth 
of answer. Passage length is a commonly used to measure the 
amount of raw information and surface complexity, both 
syntactic and semantic, of written and oral information (Lee 
and Canter, 1971; O'Hare, 1972). Hence, passage length can 
be considered to be a general measure of many of the surface 
level characteristics studied by Drum, Calfee and Cook 
(1980). Depth of answer is an adaptation of Meyer's (1975) 
notion of hierarchic prepositional structure within written 
discourse and Christensen ' s (196S) notion of sentence 
weights, to describe levels of subordination among sentences 
within paragraphs. It was one of the primary non- inferential 
measures used in the DiStefano and Valencia (in press) 
study. As used in this study, depth of answer can be 
considered to be a measure of the amount of non^inferential 
cognitive processing one performs to identify superordinate 
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and subordinate relationships among propositions explicitly 
stated in a text. 

When these two variatles were entered into the equation 
the multiple R was raised to .59 (which was still not 
significant) solely on the predictive strength of the 
variable, passage length, which had a bivariate correlation 
of -.47 with item difficulty and was the only significant 
predictor of item difficulty within the equation. 

DISCUSSION 

The results of the present study were not consistent 
with those of Hillocks and Ludlow (1984) but were generally 
consistent with those of Drum, Calfee and Cook (1980) and 
DiStefano and Valencia ( in press) . The amount of inference 
in general and the specific types of inferences were not 
found to be significant predictors of item difficulty. 
Rather, item difficulty was more a function of the raw 
amount of information which must be processed. This would 
be reflected in such gross measures of surface level 
linguistic characteristics as passage length. 

These findings can be explained by Johnson-Laird ^s 
(1983) theory that task difficulty, presumably within any 
domain, is primarily a function of the amount of information 
that must be processed and not the inherent difficulty of 
the cognitive operations performed on the information. In 
other words, inferential cognitive operation*; are not 
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inherently more difficult than non-*inferential operations. 
Task difficulty, then, is not a function of the types of 
thinking involved but the sheer amount of information that 
must be processed and the number of alternatives that must 
be kept in working memory. 

These findings can also be explained using LaBerge and 
Samuels (1974) notion of automaticity and its relationship 
to task difficulty. They state the^t once a set cognitive 
operations have been internalized- -learned at the level of 
automaticity — they require little of the capacity of 
working memory and consequently are not a major factor 
relative to the difficulty of tasks in which they arc used. 
This position is also taken by Anderson (1983) who states 
that skill or procedural learning progresses through at 
least three stages with the last being the autonomous stage- 
-that at which the procedure can be executed with little or 
no conscious attention. Relating the theory of automaticity 
to the present study, we might conclude that the difficulty 
of reading v-^omprehension items (and, presumably, other types 
of items) is a function of the extent to which the cognitive 
processes involved have been internalized and can, 
consequently, be executed automatically. It might be the 
case, then, that the inferential cognitive operations 
involved in reading comprehension test items . are inherently 
more difficult than non- inferential operations, however, 
those inferential operations have simply by internalized to 
the level of automaticity by school aged test takers. 
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If either of these interpretations is correct, it would 
imply that skill hierarchies (and, consequently, the 
distinction between inferential and literal items) have 
little practical validity as applied to standardized reading 
test items. Further research must be done to reconcile the 
discrepancy between this study and that of Hillocks and 
Ludlow. Perhaps the hierarchy of cognitive operations they 
identified is valid for relatively long blocks of discourse 
and/or for certain types of discourse. In other words, 
perhaps skill hierarchies are not independent, invariant 
constructs but change depending on the type and amount of 
information processed and the level of skill of the person 
engaged in the task. 
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